On Concise Set of Relative Candidate Keys

نویسندگان

  • Shaoxu Song
  • Lei Chen
  • Hong Cheng
چکیده

Matching keys, specifying what attributes to compare and how to compare them for identifying the same real-world entities, are found to be useful in applications like record matching, blocking and windowing [7]. Owing to the complex redundant semantics among matching keys, capturing a proper set of matching keys is highly non-trivial. Analogous to minimal/candidate keys w.r.t. functional dependencies, relative candidate keys (rcks [7], with a minimal number of compared attributes, see a more formal definition in Section 2) can clear up redundant semantics w.r.t. “what attributes to compare”. However, we note that redundancy issues may still exist among rcks on the same attributes about “how to compare them”. In this paper, we propose to find a concise set of matching keys, which has less redundancy and can still meet the requirements on coverage and validity. Specifically, we study approximation algorithms to efficiently discover a near optimal set. To ensure the quality of matching keys, the returned results are guaranteed to be rcks (minimal on compared attributes), and most importantly, minimal w.r.t. distance restrictions (i.e., redundancy free w.r.t. “how to compare the attributes”). The experimental evaluation demonstrates that our concise rck set is more effective than the existing rck choosing method. Moreover, the proposed pruning methods show up to 2 orders of magnitude improvement w.r.t. time costs on concise rck set discovery.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Children Mortality in Iran: Moving Ahead with the Sustainable Development Goals

A secular milestone is approached by the world in line of reaching Millennium Development Goals (MDGs). After December 2015, a new of flexible and global Sustainable Development Goals (SDGs) were set, replace MDGs by SDGs. Infant mortality rate (IMR) is a pivotal indicator of development in a given country that embedded in Millennium Development Goal (MDG).  After that in manner of strong clini...

متن کامل

Relative n-th non-commuting graphs of finite groups

‎Suppose $n$ is a fixed positive integer‎. ‎We introduce the relative n-th non-commuting graph $Gamma^{n} _{H,G}$‎, ‎associated to the non-abelian subgroup $H$ of group $G$‎. ‎The vertex set is $Gsetminus C^n_{H,G}$ in which $C^n_{H,G} = {xin G‎ : ‎[x,y^{n}]=1 mbox{~and~} [x^{n},y]=1mbox{~for~all~} yin H}$‎. ‎Moreover‎, ‎${x,y}$ is an edge if $x$ or $y$ belong to $H$ and $xy^{n}eq y^{n}x$ or $x...

متن کامل

Association Rule Mining based on Apriori Algorithm in Minimizing Candidate Generation

Association Rule Mining is an area of data mining that focuses on pruning candidate keys. An Apriori algorithm is the most commonly used Association Rule Mining. This algorithm somehow has limitation and thus, giving the opportunity to do this research. This paper introduces a new way in which the Apriori algorithm can be improved. The modified algorithm introduces factors such as set size and ...

متن کامل

What can FCA do for database linkkey extraction? (problem paper)

Links between heterogeneous data sets may be found by using a generalisation of keys in databases, called linkkeys, which apply across data sets. This paper considers the question of characterising such keys in terms of formal concept analysis. This question is natural because the space of candidate keys is an ordered structure obtained by reduction of the space of keys and that of data set par...

متن کامل

Unintentional Childhood Poisoning: a Neglected Subject in Iran

According to unevenly epidemiologic studies on health outcomes in Iran, some adverse outcomes such as poisoning especially in children less noticed. We were interested to write some childhood poisoning factsheets to provide easy to understand information on pitfalls and challenges around epidemiologic measures, etiology and prevention in this context. Valid evidences about childhood poisoning i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2014